A Post by Michael B. Spring

(A list of all posts by M.B. Spring)

Computational Thinking (February 24, 2009)

In the February 2009 Communications of the ACM, George H.L. Fletcher and James J. Lu make a compelling argument for introducing Computational Thinking in the K-12 Experience (pp 23-25, Communications of the ACM, 52(2).) They reference Jeannette Wing’s call for Computational Thinking put forward in the Communications of the ACM (49(3), March 2006, 33-35.) Both of these pieces provide compelling and well reasoned arguments of which I am supportive. I would like to take a few moments to articulate what I envision as a complimentary discussion of this topic.

As readers of this blog will know, I have a pet theory about the emergence of a new form of communication based on the use of digital technology – which I call immediacy. (see http://www2.sis.pitt.edu/~spring/mBsLOG/blosxom.cgi/2007/09/30#immediacy) The theory is far from my own. I like to think it has some twists that I have contributed, but it builds upon the brilliant insights of Walter Ong put forward in Orality and Literacy: The Technologizing of the Word (Walter J. Ong, Orality and Literacy: The Technologizing of the Word. New Accents. Ed. Terence Hawkes. New York: Methuen, 1988). At a more conversational level, I found Denning and Metcalfe’s Beyond Calculation to be an insightful collection of essays describing the future of computing – with what is beyond calculation being communications (P. J. Denning and B. Metcalfe, Eds., Beyond Calculation. New York: Springer-Verlag, 1997.) I would like to put forward two propositions on computational thinking, one related to communications and one related to information, a concept which I have addressed elsewhere (see http://www2.sis.pitt.edu/~spring/mBsLOG/blosxom.cgi/2008/09/25#Information)

Related to communication, I firmly believe that a new and rich form of communication complimentary to oral and literary communication will emerge over the coming centuries. It will not supplant orality – we will still tell stories in the epic traditions. It will be a sad day if we come to a time when our experience and the telling of it is not rich with exaggeration, abstraction, repurposing and other aspects of the oral tradition. At the same time, the educational system will need to develop techniques to help students come to grips with the awesome power of immediacy in the same way it helps us to become conversant with literacy. This will involve education in literacy both in terms of mechanics – like writing and composition, and examples – great literature. I have a suspicion that the Beowulf of the era of immediacy may well be Gordon Bell’s Life Bits project. (see http://research.microsoft.com/en-us/projects/mylifebits/) It is a great and concerted effort to develop a monumental collection that will tell the story of one man’s life and efforts to communicate it.

Somewhere in the future, a child will be born whose whole life experience will be captured and digitized so as to form and intimate memory supplement far beyond anything that Vannevar Bush might have imagined for his elite scientists. As time goes on, children will master techniques for capturing, organizing, mining, and sharing that personal experience. Some will try and fail and their communications via the integrated technology will be noisy, uninformative, and boring. Other will achieve a simplicity, clarity, elegance and intensity that will allow the receiver to experience the world and its structure in a way they might never have been able to before. I might imagine that a future reader of this piece might exclaim that this presentation is a horrible waste of the power of the communication medium with its stodgy use of the literary form in this much more expressive medium. All of this is to suggest that while I concur with Fletcher, Lu, and Wing that we should be considering how computational thinking might be integrated into the curriculum, we also need to be thinking about how computational communication should be woven into the curriculum.

Related to information, I would like to look at the issue of computational thinking from the perspective of information. A colleague of mine, Kai Olsen, has been a great proponent of the thesis that we can use the computer only in those areas in which we have formalized knowledge. I suspect that Kai would soften his hypothesis a little after all these years, but the basic tenet would still hold. If we don’t understand something, it is difficult to instruct a computer on how to handle it. The development of the social aspects of the web and the emergence of what some call collective intelligence suggests that first order knowledge may be derived using formalisms applied to secondary phenomena. Let me be more explicit. When the page rank algorithm is used we can identify a page that is likely to be relevant to a query, not based on the query, the person executing the query, or the resource identified, but based on the number of “important” links to that resource where the importance of the links is recursively defined for all of the sources from which those links emanate. Similarly, algorithmic processing of tags, or annotations as I view them more generically, may provide for a taxonomic, or folksonomic, classification of a resource.

Jeannette Wing articulates a brilliant set of examples of computational thinking, well worth the read. Of her half dozen everyday examples, my favorite is “Which line do you stand in at the supermarket?; that’s performance modeling for multi-server systems.” Whether it is caching, prefetching, redundant design, etc. Wing masterfully suggests ways of thinking about our world that not only enable us to live better, but provide the foundation for formal algorithm and paradigm development leading to the ability to write “good” programs later in our educational endeavors. If all my graduate students had been children of Jeannette Wing, there is no doubt that teaching client server systems would not be the enormous challenge it is today. With no intent to denigrate the shear brilliance of Prof. Wing’s insights, I would like to believe that there is some utility in seeing patterns in raw information. It may be that I am simply suggesting that I would like a more inductive approach to seeing patterns in information to compliment her approach which I liken to finding patterns in information as examples of formalisms I already know.

Information is not noise, and noise is not information is a simple, but powerful, tautology. Do we see anything in the pattern of tags that are used to classify an image, or in the annotations associated with comments on a draft document? Recently, I was looking with some of my doctoral students at tags associated via delicious with a set of resources. We have been searching for algorithms that will reliably separate “informative tags” from “noise.” Our goal in this effort is not to develop a description of the resource, but a classification. (Use of annotation to aid in description is, I believe, a somewhat simpler task than finding terms that aid in classification.) Given a wealth of information, we undertook many of the types of computational thinking that Wing, Lu and Fletcher have suggested. We identified tuples, formalized terminology and identified relationships. And in all of this, we made some small progress. One day, feeling good about the progress we were making, we looked at three lists of terms for a given resource. The first seven of 75 rows were as follows:

Annotation Frequency(AF) AF*IRF Inverted Resource Frequency (IRF)

Storage Storage Filestorage

Tools Onlinestorage Online-storage

Backup Backup Onlinestorage

Web Sync Store

Online Comparison Filehosting

Onlinestorage Sharing Harddrive

Sharing Lifehacker Drop.io

….. ….. ….
Our goal was to model and modify existing techniques used elsewhere to see if we could bubble better classification terms to the upper ranges of a list. It may not be immediately apparent from this particular example, but the terms in the AF*IRF column, on average, tend to be better than the terms from the AF column. However, the reason for showing this example lies not in the final results, but in the intermediate results, and this is a good example of the phenomenon. Remember, our goal is classification terms based on a set of annotations provided by many users. Before reading on, take a look at the IRF column and decide what you see that may be useful……

Ok, I assume you found one of the two things – the second is not very obvious in this example. Hopefully you see “filestorage” and “onlinestorage”. One of the constraints (a form of computational thinking) is that any terms provided by users are separated into a space separated list. To avoid having the terms file and storage placed as separate terms, users often simply concatenate them or hyphen join them. What appears – we are in early stages of confirming our findings – is that these compound terms appear to take a subcategory-category form. That is to say, if we split the words back up, the second term is the main category and the first term is a subcategory. From a classification point of view, this is a rich data source that would be lost using all of the methods we had been using to identify relevant terms. There is actually much more that I am not presenting here related to how these are found numerically and why it is surprising that we found them here, but if further experimentation supports this and a few other hypotheses we have, it will be an important basis for some dissertation work. The key here is that it was not so much the application of computational thinking that was critical. Surely, without the use of computational thinking, we would not have gotten to this point. At the same time, it was an observation about the data or information that leads to the critical insight that leads to possible new discoveries. I would suggest then that in addition to finding existing patterns in the world, another aspect of computational thinking is the ability to see new patterns in information.

By the way, you may be curious to know what the second finding was. If you found it, you are to be congratulated. This example is not so clear as others we have found. The second observation is that IRF surfaces things such as “drop.io”, a proper name – the name of a particular online storage system. If we had shown other examples, it would become clearer that while proper names seldom show up in the AF column, they often show up in the IRF column. Thus, we are tentatively observing that under certain conditions, the IRF column may not only show a superordinate and subordinate classification, but exemplars. Most importantly, these annotations are not as immediately apparent in the places we would have expected to find them.

Annotation Frequency(AF)	AF*IRF	Inverted Resource Frequency (IRF)
Storage	Storage	Filestorage
Tools	Onlinestorage	Online-storage
Backup	Backup	Onlinestorage
Web	Sync	Store
Online	Comparison	Filehosting
Onlinestorage	Sharing	Harddrive
Sharing	Lifehacker	Drop.io
…..	…..	….